Managing Word Form Variation of Text Retrieval in Practice - why Five Character Truncation Takes it all?

نویسنده

  • Kimmo Kettunen
چکیده

This paper discusses different methods that have been used for management of word form variation in information retrieval during the history of textual information retrieval. The techniques have been characterized in many ways during the history of IR. We pinpoint the most meaningful features of the approaches and make comparisons that have practical value. In the discussion we characterize word form variation management methods in different ways and offer the reader an overall practical guide for choosing between different methods to be used.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Word Form Text Database Indexes Developing an Automatic Linguistic Truncation Operator for Best-match Retrieval of Finnish in Inflected Developing an Automatic Linguistic Truncation Operator for Best-match Retrieval of Finnish in Inflected Word Form Text Database Indexes 465

The paper presents a new method for handling of morphological variation of query terms in best-match IR. The method is based on enhanced inflectional stems. Use of inflectional stems has earlier been shown to be a good retrieval method in inflected indexes in a best-match environment for a highly inflected and compound-rich language, Finnish. In this paper the earlier stem method is elaborated ...

متن کامل

بررسی نقش انواع بافتار هم‌نویسه‌ها در تعیین شباهت بین مدارک

Aim: Automatic information retrieval is based on the assumption that texts contain content or structural elements that can be used in word sense disambiguation and thereby improving the effectiveness of the results retrieved. Homographs are among the words requiring sense disambiguation. Depending on their roles and positions in texts, homograph contexts could be divided to different types, wit...

متن کامل

All It Takes for Corruption in Health Systems to Triumph, Is Good People Who Do Nothing; Comment on “We Need to Talk About Corruption in Health Systems”

Numerous investigations demonstrate that the problem of corruption in the health sector is enormous and has grave negative consequences for patients. Nevertheless, the problem of corruption in health systems is far from eminent in the international health policy debate. Hutchinson, Balabanova, and McKee have identifed in their Editorial five reasons why the health policy community has been relu...

متن کامل

Using Linguistic Knowledge in Information Retrieval Technical Report

The current practice in Information Retrieval is largely based on statistical techniques. These techniques are reasonably successful but many researchers believe that statistical techniques have reached their upper bound. Some recent research in IR is aimed at investigating whether Natural Language Processing techniques can be used to improve the performance of existing retrieval strategies. In...

متن کامل

The Effects of Indexing Strategy-query Term Combination on Retrieval Effectiveness in a Swedish Full Text Database

This thesis deals with Swedish full text retrieval and the problem of morphological variation of query terms in the document database. The study is an information retrieval experiment with a test collection. While no Swedish test collection was available, such a collection was constructed. It consists of a document database containing 161,336 news articles, and 52 topics with four-graded (0, 1,...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2012